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Abstract. We present an application of the Richardson-Lucy algorithm to the analysis of color-magnitude diagrams by convert- 
ing the CMD into an image and using a restoring point spread function function (psf) derived from the known, often complex, 
sources of error We show numerical experiments that demonstrate good recovery of the original image and establish conver- 
gence rates for ideal cases with single gaussian uncertainties and poisson noise using a statistic. About 30-50 iterations 
suffice. As an application, we show the results for a particular case, the Hipparcos sample of the solar neighborhood where the 
uncertainties are mainly due to parallax which we model with a composite weighted gaussian using the observed error distribu- 
tions. The resulting psf has a slightly narrower core and broader wings than a single gaussian. The reddening and photometric 
errors are considerably reduced by restricting the sample to within 80 pc and to My < 3.5. We find that the recovered image, 
which has a narrower, better defined main sequence and a more clearly defined red giant clump, can be used as input to stellar 
evolution modeling of the star formation rate in the solar vicinity and, with more contributing uncertainties taken into account, 
for general Galactic and extragalactic structure and population studies. 

Key words. (Stars:) Hertzsprung-Russell (HR) and C-M diagrams; Galaxy: evolution; Methods: statistical; (Galaxy:) solar 
neighbourhood; Galaxy: stellar content 



1. Introduction 

The color-magnitude diagram (CMD) is the key tool for study- 
ing evolution in stellar populations. However, when interpret- 
ing cluster and field CMDs we have to face different problems. 
For clusters, unless they are very nearby and large, the stars 
can be assumed to be at nearly the same distance and there- 
fore even if the absolute luminosities are unknown their relative 
values are well determined. Unless the system is very young, 
of order the pre-main sequence contraction time, the stars can 
be assumed to have the same age and metallicity since their 
formation occurs essentially instantaneously. The uncertainties 
are, consequently, mainly photometric and also due to inter- 
stellar reddening (but this, like the distance, can initially be as- 
sumed identical for all members) or in crowded fields due to 
confusion. In contrast, for field stars, the uncertainties affect- 
ing the CMD are mainly due to parallax errors and reddening, 
which are different for each star, so the relative as well as ab- 
solute luminosities are inherently uncertain. Since the aim of 
field studies is to determine the structure and the star forma- 
tion and metallicity histories of the solar neighborhood, any 
sample must be assumed to have formed over a long time. To 
unravel these histories requires a different approach than usu- 



ally adopted for the treatment of the aggregate CMD. Our aim 
in this study is to show how to obtain a cleaned input for such 
analyses that is, in effect, a restoration of the intrinsic distri- 
bution to the limit of the errors. A CMD may be regarded as 
an image, the 'intensity' being the number of stars in a bin of 
some photometric indices or effective temperature and lumi- 
nosity, affected by a point spread function (psf) that originates 
from the error distributions of the parallaxes, interstellar red- 
dening, and photometry and we propose using the same tech- 
niques that have been developed for image restoration. 

Specifically, we apply the Bayesian Richardson-Lucy al- 
gorithm (Richardson II 9721 Lucv ll974l l to an observed CMD. 
This method is very well known in the astronomical commu- 
nity and has been broadly applied to recover data from images 
and histograms (see, e.g. Bertero & Boccacci 120051 and refer- 
ences therein). Here we suggest another use, this time for the 
study of stellar populations. If we grid a CMD by building a 
two dimensional histogram in, say. My and B - V, the data is 
an image blurred by a psf that is the matrix produced by the 
observational errors. The analysis then becomes a deconvolu- 
tion problem. ' From this point of view, the loss of information 
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' There is, however, a difference with respect to a real astronomical 
image. There is no "background", or if it is due to contamination of 
the sample by stars at very different distances this can be modeled 
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about single stars is compensated by the opportunity to analyze 
the sample data in a statistically consistent sense using imag- 
ing methods. Our intent is to recover, as closely as possible 
within the limits of the psf, the intrinsic CMD for compari- 
son with stellar isochrones that may then be analyzed by any 
of many available statistical methods (see especially Tolstoy 
& S aha . 1996 Hernandez et al. .1999.2000 1. Although in what 
follows we use the Hipparcos solar neighborhood sample for 
illustration,^ the technique is as general as the RL algorithm it- 
self and we emphasize that our aim is to propose a new view 
of how to treat observatioanl CMD data for stellar and galactic 
evolution. 



2. Use of the algorithm 

Bayes' theorem provides the relationship between the probabil- 
ity of a hypothesis (model parameters) conditional on a given 
data, P{M\D) (posterior), and the probability of the data con- 
ditional on the hypothesis, P(D\M), and the prior probability 
P{M): 



P{M\D) 



P(D\M)P(M) 
j P{D\M)P{M) 



(1) 



where M are the values for the parameters of the model (the 
hypothesis) and D are the measured data. The prior is the prob- 
ability of any hypothesis being true before empirical data, the 
posterior probability is what we can know about hypothesis 
after the observation (how likely, given the observations, our 
model is as an explanation of the data); P(D\M), the likeli- 
hood, is the chance of the observation D„ given the model M„ 
for some sample n (e.g. Jaynes 2003). First, assuming the in- 
trinsic noise in the data is poisson distributed, the conditional 
probability is given by: 



exp(-M„)M„^" 



n=0 



(2) 



where the product is taken over N bins. The intrinsic fluctua- 
tions are poisson distributed. 

The psf is from the photometric and parallax errors. The 
data value is the number of stars in the nth bin which result 
from sampling some initial mass function (IMF) 0(m) at a star 
formation rate (SFR), i/'(f), and depending on some metallicity 
history, Z(f); if we fix the other two are the functions should 
be recoverable from the sample. The relevance to CMDs is that, 
depending on the type of data, there may be several contribut- 
ing factors to the total psf, K e.g. errors in photometry, paral- 
laxes, and proper motions, and uncertainties in reddening, un- 
resolved binary companions, etc.. The combination of blurring 
and fluctuations on the final image is: 

(image) - K -k (object) -i- N (3) 

within the kernel. The errors here are strictly poissonian as a result of 
the binning. 

^ The technique was developed as part of a study of the star for- 
mation history of the solar neighborhood (Cignoni 2006 1 and a more 
complete study of the results of its application is in preparation. 



where N is the noise. Richardson ( |1972 ) and Lucy < 19741 1 were 
the first to propose an iterative inversion algorithm (hereinafter 
R-L algorithm) for deconvolution: 



ik) 



D 



(4) 



with is the object estimation after the k-th iteration and D 
is the data. The method is derived assuming poisson noise and 
taking a constant prior information P{M) in equation (non- 
informative prior). It can be shown that the algorithm converges 
to a restoration that when re-blurred by the psf is close to the 
data in a maximum likelihood sense (for details see e.g. Shepp 
& Vardi 1982^. 

Although the Hipparcos data is the subject of our appli- 
cation of the R-L algorithm, its possible use extends to any 
binned CMD. Since the specific deconvolution problem must 
be preceded by a careful analysis of the associated uncertain- 
ties, the following section discusses the sources of blur in the 
Hipparcos data. In the following sections we show tests based 
on convolving synthetic CMDs, mimicking the solar neigh- 
borhood Hipparcos sample, with different psfs, and performed 
restorations using the R-L method. We show examples of ex- 
periments that demonstrate successful image restoration under 
various assumed point spread functions. The same analysis was 
performed after adding poisson noise and using the statistic 
as the convergence parameter The last section shows the result 
of applying the method to the real data using an observationally 
derived psfwd discuss some applications. 

3. Hipparcos uncertainties 

Our sample was selected using two criteria. The stars are all 
brighter than V ~ 8, which is comfortably within the com- 
pleteness interval assumed by the Hipparcos collaboration of 
between 7.3 and 9 mag depending on Galactic latitude and 
color (e.g. PeiTyman et al. 1995), and are within 80 pc from 
the Sun, coiTesponding to an absolute magnitude of 3.5 mag 
at our adopted limit. The parallax precision is generally bet- 
ter than 10% and the nonlinearity bias is assumed very small 
(e.g. Arenou & Luri 1999 ). The histogram of the absolute vi- 
sual magnitude error distribution for our sample, obtained by 
propagating the parallax errors, (see fig. 1), shows a mean error 
of about 0. 10 mag with a standard deviation of about 0.05 mag. 
The same figure shows the absolute magnitude eiTor distribu- 
tion for stars in three bins of absolute visual magnitude, respec- 
tively 2-2.5, 2.5-3, 3-3.5 mag. The histograms for stars with 
2 < My < 2.5 and 2.5 < My < 3 are the same within the pois- 
son fluctuations, while the histogram 3-3.5 presents a slightly 
systematic shift (about 0.01 - 0.02 mag) toward bigger errors. 
This last feature may be due to the selection in absolute magni- 
tude. Moreover, the parallax precision decreases with distance 
so the larger absolute magnitude errors found in the 3-3.5 mag 
could be explained by a greater mean distance of the sample. 
It is, however, a small effect so the distribution for the absolute 
magnitude error, to fair accuracy, was assumed to be indepen- 
dent of the position in the CMD as a first assumption about the 
psf. The photometric error for stars brighter than My - 3.5 is 
generally < 0.02 mag but there is also an high eiTor tail due to 
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Fig. 1. The absolute magnitude error distribution for Hipparcos 
stars brighter than My = 3.5 and nearer than 80 pc for different 
intervals of absolute magnitude. The error bar is poissonian. 
The figure shows the general agreement among the curves. 

giant and clump stars. These last groups comprise only 1 - 2% 
of the total sample and since the color uncertainty is so small, 
compared to the absolute magnitude error, we here assume that 
it can be neglected. The method we'll present can, however, be 
easily generalized to two dimensions (for instance in treating 
crowding and other photometric errors in observations of fields 
in resolved galaxies). 

4. Tests with artificial data 

Before using our method to real data, we have to test its 
ability in restoring CMDs on the basis of artificial data. We 
begin by modeling the psf with a gaussian. Considering the 
relative uncertainties in color and absolute magnitude we 
chose cr - Q.\ mag, the mean value found in the observational 
distribution. The error on B - V is negligible so the convolution 
and deconvolution affect only the absolute magnitudes but the 
method can be generalized to two dimensions. The synthetic 
CMD (fig. 2-a) was generated with a Monte Carlo technique 
(see for details Castellani et al. 2002 and Cignoni et al. I2003> 
using a power law with a Salpeter exponent (see Kroupa 2001i, 
a flat SFR (i/r - constant) and an observational age-metallicity 
relation Z(t) from Nordstrom et al. 120041 The artificial stars 
were placed on grid of tracks (Pisa Evolutionary Library^*). 
No binaries have been included but this is not an inherent 
limitation if appropriate estimates are available for their 
contributions (e.g. Hurley & Tout 1998 1. The synthetic CMD 
contains approximately the same number of stars brighter 
than My - 3.5 of the selected Hipparcos CMD, so we can 
avoid possible differences due to the statistical fluctuations. 
Figure 2-a shows the synthetic CMD and Fig. |2l-b shows the 
pixelated version, binned with a step size of 0.05 mag in both 
coordinates so the /?.s/has a cr of 2 pixels. This choice of the 
bin size is a practical compromise: smaller values produce 
statistical fluctuations too large which could be a problem 
because noise amplification is a real drawback with iterative 
algorithms. Larger bins reduce or eliminate the effectiveness of 

3 http://astro.df.unipi.it/SAA/PEL/ZO.html 



the restoration, reducing also the potential information content 
of the dataset. In our case, the observational uncertainty in 
B — V is, negligible compared to the mean error in the absolute 
magnitude so we did not test the effects of bluiTing in color 
and do not need any finer binning. Choosing 0.05 mag for the 
bin size minimized the problems mentioned above. 

Convolving the synthetic CMD with a gaussian (cr -2 pix- 
els) produces the image shown in figure 2-c. The result after 50 
iterations is shown in figure 2-d. The restored image is quite 
close to the original. To quantify this match we needed a con- 
vergence criterion (a rule that stops the algorithm when further 
iterations do not significantly improve the result). Since the 
original papers it has been convenient to take x'' this con- 
trol parameter (see Richardson ll972l Lucv l 19741 Bi & Borner 
IT9941 Molina et al. i200lt : 



where 'g is the blurred image and g is the estimate of the ob- 
served image resulting after the feth iteration. Calculating the;^f^ 
for each iteration we found a monotonically decreasing trend 
with an asymptotic value close to zero. This is because the syn- 
thetic CMD has been convolved with a pi/ without additional 
noise so R-L algorithm can perfectly recover the original CMD. 
To gain a better understanding of how the algorithm converges, 
we tried a broader psf with cr - 6 pixels. The blurred image 
and the restoration results are shown in figures 2-e-f. 

Thus far we have treated the case of ideal data (i.e. with- 
out noise). However, an important point when trying to recover 
the underlying image from noisy data is the sensitivity of the 
restoration algorithm to noise and its tendency to produce ar- 
tifacts with successive iterations. The problem is that any con- 
volution is a smoothing operation and, in the Fourier domain, 
this reduces the high spatial frequencies that characterizes the 
small scales of the image. Methods like the R-L algorithm at- 
tempt to remove the smoothing, but this is not straightforward 
since noise is also present. As originally suggested by Lucy 
(p'9'74), the estimate f'^ of the real distribution does not con- 
verge as Z*^^' - /*—> as ^ —> oo; in fact, after a best estimate 
of the agreement get worse and artifacts may appear because 
of small scale fluctuations. Since the R-L algorithm is not regu- 
larized, when the iteration number increases noise amplifies in 
the solution. In practice, the iteration number must be limited to 
find an acceptable compromise between resolution and stabil- 
ity. To test how the statistical uncertainties present in the data 
can influence the deconvolved image, we blurred a CMD with 
a pi/ with a cr = 2 pixels and added poisson noise (see figure 
The restored CMD after 50 iterations is shown in figure 
|3l-b. As expected, the result is very diff'erent from the case from 
the previous case: the addition of poisson noise prevents the re- 
covery of the exact morphology of the original CMD. In fact, 
with continued iterations the CMD becomes grainier The re- 
sult is better for the most populated regions of the CMD, those 
with high signal to noise ratio, where the CMD features are not 
buried by noise. We find that the;(f^, after ~ 50 - 100 iterations, 
reaches a plateau (see figure^J and stabilizes around this value. 
This actually illustrates the utility of the image approach since 
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Fig. 2. Synthetic tests demonstrating the recovery achieved using the R-L algorithm. From top left: (a) Synthetic CMD, (b) 2-D 
histogram for the synthetic CMD; (c) blurred with 2 pixels gaussian psf; (d) restored image after 50 iterations. For contrast, (e) 
synthetic image blurred with 6 pixels gaussian; (f) restored image for 6 pixels blurring after 50 iterations (see text for details). 



such changes can be easily modeled in a controlled way and 
the changes can be statistically monitored in a way that boot- 
strapping, for instance, cannot show. In order to preserve the 
CMD morphology, since the algorithm improves the statistical 
properties of the sample but destroys local information, the it- 
erations are stopped after about 50 cycles when the bulk of the 
restoration has been achieved. 



5. Application to real data 

Passing to real data, the first problem is to account for the distri- 
bution of parallax eiTors. In principle, we cannot adopt a mean 
error value as we did in the previous simulations {cr -2 pixels). 
If we do not take this point into account and blindly apply the 
algorithm with a cr = 2 pixels wide gaussian, the restorations 
give the;t'^ values shown in figure|5](star symbol). A more re- 
alistic approach is to use the full information contained in the 
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Iteration 50 




Fig. 3. (a) 2 pixels blurring plus poisson noise; (b) restored 
CMD after 50 iterations. 



data error distribution by forming a psf from a linear combina- 
tion of gaussians using the data shown in Fig.^ 



K 



-y- 

A ^ Hi 



1 



exp 



v2^ 



(6) 



where the A factor is a normalization constant, the weights H, 
are the histogram values from figure[l] i.e. the fraction of stars 
with absolute magnitude error between {AMv)i and (AMv)i+6), 
and the cr, are the values (AMv)i ■ The resulting psf {see figure 
6) is nearly symmetric with a narrower core and broader wings 
than a gaussian with the same full width, and with the disper- 
sion given by cr = Yjj o'j/Hj, the weighted histogram. We ob- 
tain the sequence of restorations shown in figure using this 
composite psf. Now the;\f^ converges faster (see the curve with 
filled circles in figure|5} to smaller values so the iterations can 
be stopped sooner. The main results of the restoration process 
is to compact the CMD features along the deconvolution axis 
(absolute magnitude axis). We find that (a) the red clump re- 
gion is compressed and new features appear, (b) the giant and 
sub-giant regions are both better defined, and (c) the main se- 



1550 



1500 



1450 - 



1400 - 



1350 - 



1300 




40 60 
n iterations 



Fig. 4. versus iterations number (case with blurring and 
noise). 
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Fig. 5.x^ versus iteration number for the restoration with gaus- 
sian psfistsa) and pseudo-gaussian psf (filled circles). 



quence blue edge is now sharper. These improvements in the 
CMD may appear small, but the restored CMD is the clean- 
est restored data we expect to find that is consistent with the 
uncertainties in a Bayesian sense.'* 

6. Conclusions 

The aim of this study has been to demonstrate a method for 
recovering as much information as possible from binned color- 
magnitude diagrams that can then serve as input for evolution- 



As an alternate stopping criterion, we suggest dividing the CMD 
into specific critical regions and using a weighted /A' value with the 
degrees of freedom A' depending on region. For instance, a region that 
is empty and remains always empty need not be included in the total 
number for the reduced A possible weighting for CMD analyses 
that includes extra information coming from the astrophysical setting 
is to weight the regions by their relative "importance". Since the most 
information (in a statistical sense, Jaynes 2003) comes from the least 
probable event, this can be incorporated into the convergence criterion. 
The relative probability of finding a star in any part of the CMD for 
a cluster or the field sample depends on the relative rate of evolution, 
for instance on the giant brach or for clump stars or stars at the tumoff 
point, and diminishes the importance of the main sequence stars in the 
reconstruction. 
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Fig. 7. (a) Hipparcos CMD. (b) 2-D histogram for the Hipparcos CMD. (c)-(d) Restorations with the pseudo-gaussian psf(tiie 
number of iteration is labeled). 




Fig. 6. The solid line is a gaussian function with cr - 2 pixels. 
The dashed line represents the function obtained from a linear 
combination of gaussians with standard deviations and weights 
given by data. 

ary studies of stellar ensambles. The resulting reconstructions 
should be the best cleaned data set with which to perform anal- 
yses of the star formation rates and metallicity evolution over 
time for complex samples. The advantage of this approach over 
bootstrapping comes from the ease of including in the hkeh- 



hood function a broad range of processes (known or hypothet- 
ical) that affect the location of stars within the observed CMD. 
Although we have concentrated here on a particular algorithm, 
the Richardson-Lucy method, there is in principle no restric- 
tion and others, e.g. maximum entropy, could be used instead. 
A study of the star formation history of the solar neighborhood 
will be the topic of a future paper (Cignoni et al. 2006, in prepa- 
ration). 
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